A large-scale human-annotated collection of short videos for action recognition and event understanding. The dataset covers 339 …